-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
gh-136421: Load _datetime
static types during interpreter initialization
#136583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I have no idea if this will fix the Windows build, but let's hope.
My concern is unnecessarily losing portability between the builtin module and the non-builtin one. This PR avoids a crash not by locking but forcing the main interpreter initialize the PyMODINIT_FUNC
PyInit__datetime(void)
{
+ if (PyDateTime_DateType.tp_subclasses == NULL) {
+ PyInterpreterState *interp = PyInterpreterState_Get();
+ // main interp currently checks a module before subinterp imports
+ assert(_Py_IsMainInterpreter(interp));
+ init_static_types(interp, 0);
+ }
return PyModuleDef_Init(&datetimemodule);
} |
Hmm, that feels really hacky. I wouldn't call the movement of |
The crash can be avoided even on pure Python level, and managed static extsension types are already special only for
Not sure about disabling. On Windows, |
Right, there are several ways to fix this that work, but I'm trying to find the fix that is the most correct. Relying on implementation details like
We have to find a balance between correctness and simplicity, and I'm not yet sure which that is. |
This is at least a feature change more than a bug fix. |
Yeah, you're right. I'd be fine with a bandaid fix for 3.14 if you want to make a PR. The |
Still rough example, but checking During the |
|
A small concern is the startup time. Can you do a micro performance benchmark to measure the time cost of |
#136620 is ready for review.
I haven't come up with how to measure so far. |
status = _PyDateTime_InitTypes(interp); | ||
if (_PyStatus_EXCEPTION(status)) { | ||
return status; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status = _PyDateTime_InitTypes(interp); | |
if (_PyStatus_EXCEPTION(status)) { | |
return status; | |
} | |
if (!_Py_IsMainInterpreter(interp)) { | |
status = _PyDateTime_InitTypes(interp); | |
if (_PyStatus_EXCEPTION(status)) { | |
return status; | |
} | |
} |
Reply from #136620 (comment)
Can you run test_concurrent_initialization()
with this change? Based on the crashes that come from the change, I said this PR and my example are "almost equivalent." I'm not sure right now what this PR ensures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what are you trying to achieve here? This will just break the types for the main interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just break the types for the main interpreter.
Note that test_concurrent_initialization()
does not load the _datetime
in the main inter interpreter at all.
Correction: Run the script of the test without running test_datetime
.
After discussion on DPO, the consensus is that we are okay with making @vstinner, would you mind doing a review? |
@@ -7347,11 +7344,11 @@ init_static_types(PyInterpreterState *interp, int reloading) | |||
for (size_t i = 0; i < Py_ARRAY_LENGTH(capi_types); i++) { | |||
PyTypeObject *type = capi_types[i]; | |||
if (_PyStaticType_InitForExtension(interp, type) < 0) { | |||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but as a follow up I think it would be better to now remove _PyStaticType_InitForExtension and just use _PyStaticType_InitBuiltin for it, there's a lot of special casing that could be removed.
@@ -7379,10 +7376,6 @@ _datetime_exec(PyObject *module) | |||
} | |||
/* We actually set the "current" module right before a successful return. */ | |||
|
|||
if (init_static_types(interp, reloading) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't look at the history of this code in detail but is the reloading logic still necessary or can we remove it from get_current_module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's still used by some macros:
cpython/Modules/_datetimemodule.c
Line 7406 in 0ad304f
if (!reloading) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the macros use the reloading logic to primarily avoid modifying the static types after is initialised once multiple times, you can probably move that to _PyDateTime_InitTypes now and remove it from module exec, it makes sense to have all modifications done to static types done as part of runtime init rather than module init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lib/test/datetimetester.py
Outdated
@@ -3651,6 +3651,25 @@ def test_repr_subclass(self): | |||
td = SubclassDatetime(2010, 10, 2, second=3) | |||
self.assertEqual(repr(td), "SubclassDatetime(2010, 10, 2, 0, 0, 3)") | |||
|
|||
@support.cpython_only | |||
def test_concurrent_initialization(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to test_concurrent_initialization_subinterpreters
and add another test where datetime is already imported in main interp and then imported in subinterp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lib/test/datetimetester.py
Outdated
@support.cpython_only | ||
def test_concurrent_initialization_subinterpreter(self): | ||
# Run in a subprocess to ensure we get a clean version of _datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put an anchor instead of the well-known explanation of assert_python_ok()
. Also, move the test to ExtensionModuleTests
(@support.cpython_only
is redundant there). TestDateTime
is the place to test the datetime class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure what you mean by "an anchor".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. I meant the gh-issue number or the url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, did both.
Lib/test/datetimetester.py
Outdated
print('a', end='') | ||
with InterpreterPoolExecutor() as executor: | ||
for _ in range(8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I think this is fine. Many other systems have 8 cores, not 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but it sounds like you are talking about the max_workers
argument rather than the submit count here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_workers
just chooses the number of threads on the system when not set. We could try to submit os.cpu_count()
number of futures, but we don't need to overcomplicate this; we just need something that stresses several subinterpreters trying to import _datetime
concurrently.
Modules/_datetimemodule.c
Outdated
|
||
// `&...` is not a constant expression according to a strict reading | ||
// of C standards. Fill tp_base at run-time rather than statically. | ||
// See https://bugs.python.org/issue40777 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not address the possible races in PyDateTime_*.tp_base = &PyDateTime_*Type;
below, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not. I don't think there's an easy way to do this here, because atomically storing tp_base
will continue to race with all the non-atomic reads elsewhere.
Do we even need to load it at runtime like this? We have other examples of directly storing it in the PyTypeObject
structure:
cpython/Objects/methodobject.c
Line 396 in 958657b
.tp_base = &PyCFunction_Type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed now because the module is statically linked, that issue happens only with dynamic loaded modules so you can define it statically now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a _Py_IsMainInterPreter()
check will suffice in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed now because the module is statically linked, that issue happens only with dynamic loaded modules so you can define it statically now.
Ah, TIL. That's definitely the best option here then.
I guess a _Py_IsMainInterPreter() check will suffice in this PR?
For what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ensuring tp_base
is set only once after Py_Initialize()
. But I missed the Kumar 's comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove PyDateTime_DateTimeType.tp_base = &PyDateTime_DateType;
as well.
_datetime
is a special module, because it's the only non-builtin C extension that contains static types. As such, it would initialize static types in the module's execution function, which can run concurrently. Since static type initialization is not thread-safe, this caused crashes. This fixes it by moving the initialization of_datetime
's static types to interpreter startup (where all other static types are initialized), which is already properly protected through other locks._datetime
in sub-interpreters in the same time may crash the process #136421